Search CORE

171 research outputs found

A generalized Robinson-Foulds distance for labeled trees.

Author: Briand S.
Dessimoz C.
El-Mabrouk N.
Lafond M.
Lobinska G.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/11/2020
Field of study

The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time. For phylogenetic applications involving genes, however, a crucial aspect of the trees ignored by the RF metric is the type of the branching event (e.g. speciation, duplication, transfer, etc). We extend RF to trees with labeled internal nodes by including a node flip operation, alongside edge contractions and extensions. We explore properties of this extended RF distance in the case of a binary labeling. In particular, we show that contrary to the unlabeled case, an optimal edit path may require contracting "good" edges, i.e. edges shared between the two trees. We provide a 2-approximation algorithm which is shown to perform well empirically. Looking ahead, computing distances between labeled trees opens up a variety of new algorithmic directions.Implementation and simulations available at https://github.com/DessimozLab/pylabeledrf

Serveur académique lausannois

UCL Discovery

A generalized Robinson-Foulds distance for labeled trees

Author: Briand S
Dessimoz C
El-Mabrouk N
Lafond M
Lobinska G
Publication venue
Publication date: 18/11/2020
Field of study

Background: The Robinson-Foulds (RF) distance is a well-established measure between phylogenetic trees. Despite a lack of biological justification, it has the advantages of being a proper metric and being computable in linear time. For phylogenetic applications involving genes, however, a crucial aspect of the trees ignored by the RF metric is the type of the branching event (e.g. speciation, duplication, transfer, etc). Results: We extend RF to trees with labeled internal nodes by including a node flip operation, alongside edge contractions and extensions. We explore properties of this extended RF distance in the case of a binary labeling. In particular, we show that contrary to the unlabeled case, an optimal edit path may require contracting “good” edges, i.e. edges shared between the two trees. Conclusions: We provide a 2-approximation algorithm which is shown to perform well empirically. Looking ahead, computing distances between labeled trees opens up a variety of new algorithmic directions. Implementation and simulations available at https://github.com/DessimozLab/pylabeledrf

UCL Discovery

Sorting by reversals, block interchanges, tandem duplications, and deletions

Author: D Bader
D Bertrand
D Christie
D Sankoff
D Sankoff
E Tannier
G Blanc
H Nagamochi
I Elias
J Mixtacki
K Swenson
M Marron
M Ozery-Flato
Martin Bader
N El-Mabrouk
N El-Mabrouk
N El-Mabrouk
N El-Mabrouk
R Warren
S Hannenhalli
S Yancopoulos
S Yancopoulos
T Hartman
T Hartman
V Bafna
X Chen
Y Han
Z Fu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Finding sequences of evolutionary operations that transform one genome into another is a classic problem in comparative genomics. While most of the genome rearrangement algorithms assume that there is exactly one copy of each gene in both genomes, this does not reflect the biological reality very well – most of the studied genomes contain duplicated gene content, which has to be removed before applying those algorithms. However, dealing with unequal gene content is a very challenging task, and only few algorithms allow operations like duplications and deletions. Almost all of these algorithms restrict these operations to have a fixed size. Results In this paper, we present a heuristic algorithm to sort an ancestral genome (with unique gene content) into a genome of a descendant (with arbitrary gene content) by reversals, block interchanges, tandem duplications, and deletions, where tandem duplications and deletions are of arbitrary size. Conclusion Experimental results show that our algorithm finds sorting sequences that are close to an optimal sorting sequence when the ancestor and the descendant are closely related. The quality of the results decreases when the genomes get more diverged or the genome size increases. Nevertheless, the calculated distances give a good approximation of the true evolutionary distances.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

We present a data structure called a history graph that offers a practical basis for the analysis of genome evolution. It conceptually simplifies the study of parsimonious evolutionary histories by representing both substitutions and double cut and join (DCJ) rearrangements in the presence of duplications. The problem of constructing parsimonious history graphs thus subsumes related maximum parsimony problems in the fields of phylogenetic reconstruction and genome rearrangement. We show that tractable functions can be used to define upper and lower bounds on the minimum number of substitutions and DCJ rearrangements needed to explain any history graph. These bounds become tight for a special type of unambiguous history graph called an ancestral variation graph (AVG), which constrains in its combinatorial structure the number of operations required. We finally demonstrate that for a given history graph

G

, a finite set of AVGs describe all parsimonious interpretations of

G

, and this set can be explored with a few sampling moves.Comment: 52 pages, 24 figure

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

PubMed Central

eScholarship - University of California

Efficient algorithms for analyzing segmental duplications with deletions and inversions in genomes

Author: Benjamin J Raphael
CL Kahn
CL Kahn
Crystal L Kahn
D Bertrand
D Sankoff
J Bailey
J Ma
K Chaudhuri
M Johnson
M Lajoie
M Marron
MA Alekseyev
N El-Mabrouk
N El-Mabrouk
O Elemento
P Pevzner
Shay Mozes
X Chen
Y Zhang
Z Jiang
Publication venue: BioMed Central
Publication date: 22/12/2009
Field of study

Background: Segmental duplications, or low-copy repeats, are common in mammalian genomes. In the human genome, most segmental duplications are mosaics comprised of multiple duplicated fragments. This complex genomic organization complicates analysis of the evolutionary history of these sequences. One model proposed to explain this mosaic patterns is a model of repeated aggregation and subsequent duplication of genomic sequences. Results: We describe a polynomial-time exact algorithm to compute duplication distance, a genomic distance defined as the most parsimonious way to build a target string by repeatedly copying substrings of a fixed source string. This distance models the process of repeated aggregation and duplication. We also describe extensions of this distance to include certain types of substring deletions and inversions. Finally, we provide an description of a sequence of duplication events as a context-free grammar (CFG). Conclusion: These new genomic distances will permit more biologically realistic analyses of segmental duplications in genomes.

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Genome rearrangements with duplications

Author: B Hiller
C Zheng
D Bertrand
D Bryant
D Sankoff
D Sankoff
Drosophila 12 Genomes Consortium
F Cabanillas
F Mitelman
F Mitelman
G Blanc
G Blin
J Salse
K Swenson
M Bader
M Marron
M Ozery-Flato
M Ozery-Flato
M Ozery-Flato
Martin Bader
N El-Mabrouk
N El-Mabrouk
N El-Mabrouk
S Gog
S Hannenhalli
S Hannenhalli
S Yancopoulos
S Yancopoulos
T Hartman
T Hartman
V Bafna
X Chen
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

A framework for orthology assignment from gene rearrangement data

Author: A. Caprara
B. Larget
B.M.E. Moret
C. Thach Nguyen
D. Bryant
D. Sankoff
D. Sankoff
D. Sankoff
D. Sankoff
D.A. Bader
G. Tesler
J. Earnest-DeYoung
J. Tang
J.L. Boore
J.L. Boore
K.M. Swenson
M. Blanchette
M. Marron
M.E. Cosner
N. El-Mabrouk
N. El-Mabrouk
S. Hannenhalli
S.R. Downie
X. Chen
Publication venue: Springer
Publication date: 01/01/2005
Field of study

Abstract. Gene rearrangements have successfully been used in phylogenetic reconstruction and comparative genomics, but usually under the assumption that all genomes have the same gene content and that no gene is duplicated. While these assumptions allow one to work with organellar genomes, they are too restrictive when comparing nuclear genomes. The main challenge is how to deal with gene families, specifically, how to identify orthologs. While searching for orthologies is a common task in computational biology, it is usually done using sequence data. We approach that problem using gene rearrangement data, provide an optimization framework in which to phrase the problem, and present some preliminary theoretical results.

CiteSeerX

Crossref

Ancestral Genome Organization: An Alignment Approach

Author: Blanchette M.
Bourque G.
David Ardell
El-Mabrouk N.
Jiang M.
Krister Swenson
Nadia El-Mabrouk
Patrick Holloway
Pe'er I.
Swenson K.
Withers M.
Publication venue: 'Mary Ann Liebert Inc'
Publication date
Field of study

Crossref